image domain
The supplementary materials includes a detailed description of implementation details for experiments
We use BLIP-2 models built on the FLAN-T5 language model family. We use the same padding side as the FLAN-T5 models. We use a batch size of 8 for all datasets and models. The Q-former is kept in full precision. To produce decompositions, we use multinomial beam search sampling with 5 beams and a top-p of 0.95.
- North America > Canada > Ontario (0.05)
- South America > Chile (0.04)
- North America > United States > Texas (0.04)
- Media > Film (1.00)
- Leisure & Entertainment > Games > Computer Games (0.68)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > Canada > Ontario (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- (9 more...)
- Media > Film (1.00)
- Leisure & Entertainment > Games > Computer Games (0.67)
- Information Technology (0.67)
- Leisure & Entertainment > Games > Chess (0.47)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > Canada (0.04)
Aligning Silhouette Topology for Self-Adaptive 3D Human Pose Recovery
Articulation-centric 2D/3D pose supervision forms the core training objective in most existing 3D human pose estimation techniques. Except for synthetic source environments, acquiring such rich supervision for each real target domain at deployment is highly inconvenient. However, we realize that standard foreground silhouette estimation techniques (on static camera feeds) remain unaffected by domain-shifts. Motivated by this, we propose a novel target adaptation framework that relies only on silhouette supervision to adapt a source-trained model-based regressor. However, in the absence of any auxiliary cue (multi-view, depth, or 2D pose), an isolated silhouette loss fails to provide a reliable pose-specific gradient and requires to be employed in tandem with a topology-centric loss. To this end, we develop a series of convolution-friendly spatial transformations in order to disentangle a topological-skeleton representation from the raw silhouette. Such a design paves the way to devise a Chamfer-inspired spatial topological-alignment loss via distance field computation, while effectively avoiding any gradient hindering spatial-to-pointset mapping. Experimental results demonstrate our superiority against prior-arts in self-adapting a source trained model to diverse unlabeled target domains, such as a) in-the-wild datasets, b) low-resolution image domains, and c) adversarially perturbed image domains (via UAP).
Advancing Limited-Angle CT Reconstruction Through Diffusion-Based Sinogram Completion
Guo, Jiaqi, Lopez-Tapia, Santiago, Katsaggelos, Aggelos K.
ABSTRACT Limited Angle Computed Tomography (LACT) often faces significant challenges due to missing angular information. Unlike previous methods that operate in the image domain, we propose a new method that focuses on sinogram inpaint-ing. We leverage MR-SDEs, a variant of diffusion models that characterize the diffusion process with mean-reverting stochastic differential equations, to fill in missing angular data at the projection level. Furthermore, by combining distillation with constraining the output of the model using the pseudo-inverse of the inpainting matrix, the diffusion process is accelerated and done in a step, enabling efficient and accurate sinogram completion. Quantitative experimental results demonstrate that the proposed method achieves state-of-the-art performance in both perceptual and fidelity quality, offering a promising solution for LACT reconstruction in scientific and clinical applications.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.40)
- North America > United States > Illinois > Cook County > Evanston (0.04)
- Research Report > Promising Solution (0.34)
- Research Report > New Finding (0.34)
- North America > Canada > Ontario (0.05)
- South America > Chile (0.04)
- North America > United States > Texas (0.04)
- Media > Film (1.00)
- Leisure & Entertainment > Games > Computer Games (0.68)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > Canada > Ontario (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- (9 more...)
- Media > Film (1.00)
- Leisure & Entertainment > Games > Computer Games (0.67)
- Information Technology (0.67)
- Leisure & Entertainment > Games > Chess (0.47)
The supplementary materials includes a detailed description of implementation details for experiments
We use BLIP-2 models built on the FLAN-T5 language model family. We use the same padding side as the FLAN-T5 models. We use a batch size of 8 for all datasets and models. The Q-former is kept in full precision. To produce decompositions, we use multinomial beam search sampling with 5 beams and a top-p of 0.95.
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > Canada (0.04)